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Abstract 

We consider a linear recursion of the form 

N 

R (k+i) v + Q, 

i= 1 

where ( Q , A, Ci, C 2 , ■ ■ ■) is a real-valued random vector with N e N = {0,1,2,... }, {ijjffiteN is 

a sequence of i.i.d. copies of , independent of (Q, N, C\, C 2 ,...), and = denotes equality in 
distribution. For suitable vectors ( Q , N, C\, C 2 , ■ ■ ■) and provided the initial distribution of R 0) 
is well-behaved, the process R^ is known to converge to the endogenous solution of the corre¬ 
sponding stochastic fixed-point equation, which appears in the analysis of information ranking 
algorithms, e.g., PageRank, and in the complexity analysis of divide and conquer algorithms, 
e.g. Quicksort. Naive Monte Carlo simulation of lh fc ) based on the branching recursion has ex¬ 
ponential complexity in /c, and therefore the need for efficient methods. We propose in this paper 
an iterative bootstrap algorithm that has linear complexity and can be used to approximately 
sample R^ k \ We show the consistency of estimators based on our proposed algorithm. 


1 Introduction 


The complexity analysis of divide and conquer algorithms such as Quicksort UDEHE] and the more 
recent analysis of information ranking algorithms on complex graphs (e.g., Google’s PageRank) 
uadis] motivate the analysis of the stochastic fixed-point equation 

N 

R = X C r R r + Q, (1.1) 

r =1 


where (Q, N, C±, C 2 , ■ ■ ■) is a real-valued random vector with JVeN, and is a sequence of 

i.i.d. copies of R, independent of (Q, JV, Ci, C 2 ,...). More precisely, the number of comparisons 
required in Quicksort for sorting an array of length n, properly normalized, satisfies in the limit as 
the array’s length grows to infinity a distributional equation of the form in (1.1). In the context 
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of ranking algorithms, it has been shown that the rank of a randomly chosen node in a large 
directed graph with n nodes converges in distribution, as the size of the graph grows, to R, where 
N represents the in-degree of the chosen node and the {Q} ? ;>i are functions of the out-degrees of 
its neighbors. 


Although equation (1.1) is known to have multiple solutions, and an extensive amount of literature 
has been devoted to their characterization (see e.g. mm® and the references therein), in applica¬ 
tions we are often interested only in the so-called endogenous solution. This solution can be shown 
to be the unique limit under iterations of the distributional recursion 


N 

R {k+ 1) = J2 QRp + Q, 


i =1 


( 1 . 2 ) 


where (Q, N, C\, C 2 , ■ ■ ■) is a real-valued random vector with N € N, and {R- k) is a sequence 
of i.i.d. copies of R^ k \ independent of ( Q , N, C±, C 2 , ■ ■ ■ ), provided one starts with an initial distri¬ 
bution for R^ with sufficient finite moments (see, e.g., Lemma 4.5 in [8]). Moreover, asymptotics 
for the tail distribution of the endogenous solution R are available under several different sets of as¬ 
sumptions for (Q, N, C\, C 2 , ■ ■ ■) [3 El0HO]. However, no approximations exist for the distribution 
of R besides its tail behavior, and even the calculation of its non-integer/absolute moments can be 
difficult. Hence the need to design efficient numerical methods to compute relevant statistics. 


As will be discussed later, the endogenous solution to (1.1) can be explicitly constructed on a 


weighted branching process. Thus, drawing some similarities with the analysis of branching pro¬ 
cesses, and the Galton-Watson process in particular, one could think of using the Laplace transform 
of R to obtain its distribution. Unfortunately, the presence of the weights {C/} in the Laplace trans¬ 
form 


ip(s) = E [exp (—si?)] = E 


N 


exp (-sQ) JJ^(sCi) 


i= 1 


makes its inversion problematic, making a simulation approach even more important. 

The first observation we make regarding the simulation of R, is that when P(Q = 0) < 1 it is enough 
to be able to approximate R^ for fixed values of k, since both R^ and R can be constructed 
in the same probability space in such a way that the difference | R^ — i?| is geometrically small. 
More precisely, under very general conditions (see Proposition |2.1| in Section [ 2 ]), there exist positive 
constants K < 00 and c < 1 such that 


E 


R^ - R 


A 


< Kc k+1 . 


(1.3) 


Our goal is then to simulate R^ for a suitably large value of k. 

The simulation of R^ is not that straightforward either, since naive Monte Carlo using (1.2) 
starting from some initial distribution R 1°' implies the computation of a geometric number of 
copies of (Q,N,Ci,C 2 ,...), of order (E[N]) k when E[N] > 1, which is usually the case in the 
applications we are interested in. Hence, the naive simulation approach can be prohibitive. Instead, 
we propose in this paper an iterative bootstrap algorithm that outputs a sample pool of observations 
i w ] loge eni pi r i ca l distribution converges, in the Kantorovich-Rubinstein distance, to that 
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Figure 1: Weighted branching process 


of R^ k > as the size of the pool m —> oo. This mode of convergence is equivalent to weak convergence 
and convergence of the first absolute moments (see, e.g., E2]). Moreover, the complexity of our 
proposed algorithm is linear in k. 

The paper is organized as follows. Section [2] describes the weighted branching process and the 
linear recursion. The algorithm itself is given in Section [3] . Section [4] introduces the Kantorovich- 
Rubinstein distance and proves the convergence properties of our proposed algorithm. Numerical 
examples to illustrate the precision of the algorithm are presented in Section [5] 


2 Linear recursions on weighted branching processes 


As mentioned in the introduction, the endogenous solution to (1.1) can be explicitly constructed 
on a weighted branching process. To describe the structure of a weighted branching process, let 
N+ = (1, 2, 3,...} be the set of positive integers and let U = U/tLo(^+) fc be the set of all finite 
sequences i = (*i, * 2 , • ■ ■, in), n > 0, where by convention N+ = {0} contains the null sequence 0. To 
ease the exposition, we will use (i, j) = (R,... ,i n ,j) to denote the index concatenation operation. 

Next, let (Q, N, C\, C 2 ,...) be a real-valued vector with IV 6 N. We will refer to this vector as 
the generic branching vector. Now let {(Qi, N\, C7i,2), ■ ■ ■ )}iec be a sequence of i.i.d. copies 

of the generic branching vector. To construct a weighted branching process we start by defining a 
tree as follows: let Aq = {0} denote the root of the tree, and define the nth generation according 
to the recursion 

A n = {(i, i n ) € U : i € A n _ 1 ,1 < i n < W}, n > 1. 

Now, assign to each node i in the tree a weight Ili according to the recursion 


— 1) C( Mf0 IIi, Tl 1, 

see Figure [lj If P(N < 00) = 1 and Ci = 1 for all i > 1, the weighted branching process reduces 
to a Galton-Watson process. 

For a weighted branching process with generic branching vector (Q, N, C±, C 2 ,...), define the pro¬ 
cess { R : k > 0} as follows: 

k 

R(k) = EE Qi n i> k > 0. (2.1) 

j=0 i eAj 
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By focusing on the branching vector belonging to the root node, i.e., (Q%, -ZVjj, C\, C 2 ,...) we can 
see that the process satisfies the distributional equations 

R {0) =Qq = Q 

Nq / k \ N 

= E E Q{rA)n<r,i)/Cr +Q 0 = ^C r 4 fc - 1) + Q, k> 1, (2.2) 

r =1 \i=l (ri)eA, / r=l 

where ^ are i.i.d. copies of all independent of (Q, N, Ci, C 2 , ■ ■ ■). Here and throughout 

the paper the convention is that XY/Y = 1 if Y = 0. Moreover, if we define 

OO 

i? = EE^ n - m 

j= 0i eAj 

we have the following result. We use x V y to denote the maximum of x and y. 


Proposition 2.1 Let (3 > 1 be such that £7[|Q|^] < 00 and E IQl) j <00. In addition, 

assume either (i) (pi V pp) < 1 , or (ii) /3 = 2, p\ = 1, pp < 1 and E[Q\ = 0. Then, there exist 
constants Kp > 0 and 0 < cp < 1 such that for R^ and R defined according to (2.1) and (2.3), 
respectively, we have 


sup E 
k> 0 


| RWf 


< Kp < 00 


and 


E 


\R(V - Rf 


< Kpc 


k-\-l 

p 


Proof. For the case p\M pp < 1, Lemma 4.4 in [8] gives that for W n = QiHi and some finite 

constant Hp we have 


E 


\W n f <Hp( Pl V pp) n . 


Let cp = pi V pp. Minkowski’s inequality then gives 

k 00 




VP 


n =0 


72—0 


1 — c 


1 IP 


1 IP 


= (Kp) 1 ^ < 00 . 


Similarly, 




n=k-\-l 


n=k-\-l 


Ha 


\ 1 IP 


= ( Kpc k p +1 


VP 


For the case /3 = 2, pi = 1, pp < 1 and E[Q\ = 0 we have that 



J? 

•© 

to 


' n 9 

E [W 2 ] = E 

E C rW n -l,r 

= E 

E^(W„_ 1,,) 2 + E C r C s W n -i, r W n -i,s 


LW J \ 


r= 1 1<^7 
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where W n -i >r = Yl( r i)eA n Q(r,i)^(r,i)/C r , and the {W n - i, r }r>i are i.i.d. copies of W n - 1 , indepen¬ 
dent of (Nq, Ci, C 2 , ■ ■ ■)■ Since £^[W n ] = 0 for all n > 0, it follows that 

E[Wn] = P2E[W 2 n _ l ] = p n 2 E[Wl\ = Var (Q)p*. 

The two results now follow from the same arguments used above with H -2 = Var(Q) and C 2 = P 2 - 


It follows from the previous result that under the conditions of Proposition 2.1 R^ converges to 


R both almost surely and in L^- norm. Similarly, if we ignore the Q in the generic branching vector, 
assume that Ci> 0 for all i, and define the process 


Ni ( \ N 

= Y n ; = Y c r l Y n w/Cr = Y cM k ~ l \ 

ie A k r= 1 \(r,i)eAfc ) r =1 

where the {Wj k ^ } r >i are i.i.d. copies of W^ k 1 ' independent of (N, C\, C' 2 ,...), then it can be 
shown that {W^/p k : k > 0} defines a nonnegative martingale which converges almost surely to 
the endogenous solution of the stochastic fixed-point equation 


N 


where the {Wi}i> \ are i.i.d. copies of W, independent of (N, C\,C 2 , ...). We refer to this equation 
as the homogeneous case. 

As mentioned in the introduction, our objective is to generate a sample of R.^ for values of k 
sufficiently large to suitably approximate R. Our proposed algorithm can also be used to simulate 
W( k \ but due to space limitations we will omit the details. 


3 The algorithm 


Note that based on (2.1), one way to simulate R^ would be to simulate a weighted branching 


process starting from the root and up to the k generation and then add all the weights Qilb for 

i e 

represent the Q, for i e A}., and use them to generate a pool of i.i.d. observations of R ^ by setting 


0 Aj. Alternatively, we could generate a large enough pool of i.i.d. copies of Q which would 


Ni 

=Y C (i,r)4 0) + Qi, 

r =1 

where {(Qi, Ni, Cuy, C(i, 2)1 • • • )}*>i are i.i.d. copies of the generic branching vector, independent 

of everything else, and the rY are the Q 1 s generated in the previous step. We can continue this 
process until we get to the root node. On average, we would need (E[N]) k i.i.d. copies of Q for 
the first pool of observations, (^[W])^ 1 copies of the generic branching vector for the second pool, 
and in general, ( E[N]) k ~ J ' for the jth step. This approach is equivalent to simulating the weighted 
branching process starting from the fcth generation and going up to the root, and is the result of 


iterating (1.2). 
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Our proposed algorithm is based on this “leaves to root” approach, but to avoid the need for a 
geometric number of “leaves”, we will resample from the initial pool to obtain a pool of the same 
size of observations of RS 1 '. In general, for the jth generation we will sample from the pool obtained 
in the previous step of (approximate) observations of to obtain conditionally independent 

(approximate) copies of R^\ In other words, to obtain a pool of approximate copies of we 
bootstrap from the pool previously obtained of approximate copies of R^~ l \ The approximation 
lies in the fact that we are not sampling from R^~R itself, but from a finite sample of conditionally 
independent observations that are only approximately distributed as i?C? — The algorithm is 
described below. 

Let (Q, N, C\, C' 2 , • • •) denote the generic branching vector defining the weighted branching process. 
Let k be the depth of the recursion that we want to simulate, i.e., the algorithm will produce a 
sample of random variables approximately distributed as R^ k \ Choose m £ N+ to be the bootstrap 
sample size. For each 0 < j < k, the algorithm outputs = ^R^’ m \ R^’ m \ ..., Rm ™^ , which 

we refer to as the sample pool at level j. 


1. Initialize: Set j = 0. Simulate a sequence of i.i.d. copies of Q and let = Qi 

for i = 1,..., m. Output 7 = ^R^ >,m \ R 2 °’ m \ ..., Rm ’ m ^^ and update j = 1. 

2. While j < k: 


i) 


ii) 


Simulate a sequence {(Qi, Ni, Cu^, Cup), ... )}™ x of i.i.d. copies of the generic branch¬ 
ing vector, independent of everything else. 

Let 


Ni 

#"*> = £C, 

r—1 


R ( tJ’ m) + Qi, 




i = 1, • 


, m, 


(3.1) 


iii) 


where the R^ ir ^’ m ^ are sampled uniformly with replacement from the pool 1 > m \ 
Output = (^R^’ m \ R^' m \ ..., Rrn ni} and update j = j + 1. 


Bootstrapping refers broadly to any method that relies on random sampling with replacement [5j. 
For example, bootstrapping can be used to estimate the variance of an estimator, by constructing 
samples of the estimator from a number of resamples of the original dataset with replacement. 
With the same idea, our algorithm draws samples uniformly with replacement from the previous 


bootstrap sample pool. Therefore, the R 


(i,r) 


on the right-hand side of (3.1) are only condition¬ 


ally independent given V^ 1,m ). Hence, the samples in are identically distributed but not 

independent for j > 1. 


As we mentioned earlier, the distribution of the {R^’ m ^} in , p(I rn l are only approximately dis¬ 
tributed as R^\ with the exception of the which are exact. The first thing that we need 

to prove is that the distribution of the observations in does indeed converge to that of R^\ 

Intuitively, this should be the case since the empirical distribution of the is the empirical 

distribution of m i.i.d. observations of R^°\ and therefore should be close to the true distribution 
of R(°1 for suitably large m. Similarly, since the {^(L” 1 )} are constructed by sampling from the 
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empirical distribution of 'P^ ))in \ which is close to the true distribution of R^\ then their empirical 
distribution should be close to the empirical distribution of R^ l \ which in turn should be close to 
the true distribution of IT 1 '. Inductively, provided the approximation is good in step j — 1, we can 
expect the empirical distribution of R^- m ) to be close to the true distribution of R^\ In the follow¬ 
ing section we make the mode of the convergence precise by considering the Kantorovich-Rubinstein 
distance between the empirical distribution of and the true distribution of R^\ 


The second technical aspect of our proposed algorithm is the lack of independence among the 
observations in 'p( k ’ m \ since a natural estimator for quantities of the form E[h{R^)\ would be to 
use 


1 

771 . 




(3.2) 


i =1 


Hence, we also provide a result establishing the consistency of estimators of the form in (3.2) for a 
suitable family of functions h. 


We conclude this section by pointing out that the complexity of the algorithm described above is 
of order km, while the naive Monte Carlo approach has order (E[N]) k m. This is a huge gain in 
efficiency. 


4 Convergence and consistency 

In order to show that our proposed algorithm does indeed produce observations that are approxi¬ 
mately distributed as R^ for any fixed k, we will show that the empirical distribution function of 
the observations in , i.e., 

^ m 

F k ,m{x) = - Y. l(£?’ m) < x ) 
i =1 

converges as m —> oo to the true distribution function of R^ k \ which we will denote by T).. We 
will show this by using the Kantorovich-Rubinstein distance, which is a metric on the space of 
probability measures. In particular, convergence in this sense is equivalent to weak convergence 
plus convergence of the first absolute moments. 

Definition 1 let M (/j, is) denote the set of joint probability measures ontxR with marginals p 
and is. then, the Kantorovich-Rubinstein distance between p and is is given by 

di(n,is)= inf / \x - y\dir(x,y). 
neM{n,v) Jk x r 

We point out that d\ is only strictly speaking a distance when both fi and is have finite first absolute 
moments. Moreover, it is well known that 

cl COO 

o?i(/x, is) = / |F _1 (ii) — G~ 1 (u)\du = / \F(x) — G(x)\dx, (4-1) 

J 0 J —oo 

where F and G are the cumulative distribution functions of p and is, respectively, and f~ 1 (t) = 
inf{a: G M : f(x) > t} denotes the pseudo-inverse of /. It follows that the optimal coupling of 
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two real random variables X and Y is given by ( X , Y) = (F 1 (U), G 1 (U)), where U is uniformly 
distributed in [0,1]. 

Remark 4.1 The Kantorovich-Rubinstein distance is also known as the Wasserstein metric of 
order 1. In general, both the Kantorovich-Rubinstein distance and the more general Wasserstein 
metric of order p can be defined in any metric space; we restrict our definition in this paper to the 
real line since that is all we need. We refer the interested reader to m for more details. 

With some abuse of notation, for two distribution functions F and G we use di(F,G) to denote 
the Kantorovich-Rubinstein distance between their corresponding probability measures. 

The following proposition shows that for i.i.d. samples, the expected value of the Kantorovich- 
Rubinstein distance between the empirical distribution function and the true distribution converges 
to zero. 


Proposition 4.2 Let {X t } t > i be a sequence of i.i.d. random variables with common distribution 
F. Let F n denote the empirical distribution function of a sample of size n. Then, provided there 
exists a £ (1, 2) such that E [|Xi|“] < oo, we have that 


E[d 1 (F n ,F)}<n~ 1+1 / a {^r + 2 


a — 1 2 — a 


EUif 


Proposition 4.2 can be proved following the same arguments used in the proof of Theorem 2.2 in 
|4j by setting M = 1, and thus we omit it. 

We now give the main theorem of the paper, which establishes the convergence of the expected 
Kantorovich-Rubinstein distance between F k rn and F k . Its proof is based on induction and the 

explicit representation (4.1). Recall that pp = E |Ci|^ ■ 


Theorem 4.3 Suppose that the conditions of Proposition 2.1 are satisfied for some (3 > 1. Then, 
for any a £ (1,2) with a < (3, there exists a constant K a < oo such that 


E 


di(F k , m ,F k )] <K a m~ 1+1 / a YlPi- 


(4.2) 


i=0 


Proof. By Proposition |2.1| there exists a constant H a such that 


H a = sup E 
k> o 


|4?( fc )| c 


< sup (E 
k> o 


\ R (k)\h 


a/p 


< OO. 


Set K a = H a [^ I + EL 


We will give a proof by induction. 















where {Qi}i> 1 is a sequence of i.i.d. cop ies of Q. It follows that Fo, m is the empirical distribution 
function of R^°\ and by Proposition 4.2 we have that 


E 


dl(Fo,m, Fo) 


< K n m~ 1+1/a . 


Now suppose that (4.2) holds for j — 1. Let {t/*}i, r >i be a sequence of i.i.d. Uniform(0,1) random 
variables, independent of everything else. Let {(Qi, Ni , Ct i 2 )i ■ ■ ■ )}i>i be a sequence of i.i.d. 

copies of the generic branching vector, also independent of everything else. Recall that Fj _i is the 
distribution function of R b _1 ) and dehne the random variables 


?(L m ) 


Ni 


Ni 


E ?"' 1 = E + Qi and R<f> = £ C (i , r) Fr_\(U l r ) + Qi 

r=1 r— 1 

for each i = 1, 2, ..., m. Now use these random variables to define 

m 1 m 

Fj,m{x) = V] < x ) and F jm (x) = V' l(R- j) < ®). 

777 ^ m z ^ 




m 


i =1 


Note that is an empirical distribution function of i.i.d. copies of R^\ which has been carefully 
coupled with the function Fj. m produced by the algorithm. 


By the triangle inequality and Proposition 4.2 we have that 

di(F j>m ,Fj) <E di(F jtm ,F jim ) + E [di(F jjTn , Fj)\ < E d , ( . F j: 


E 


+ K a m~ 1+1/a . 


To analyze the remaining expectation note that 


E 


di(Fj ,m 5 Fj,i 


= E 


1 

m 

1 

m 

< E 


/ | Ej m (x 

) - -Pj-.m 

m 

/*oo 



/ 


i=l 

«/ —oo 


m 

r?0) 

K i ~ K i 

1=1 



m 

Ni 


£* 

E C f.i,r)(R- 

i= 1 

r=l 


r ^ 

- 



dx 




r =1 


E 


d\ (Pj_i,m, Fj —\) 


where in the last step we used the fact that (Ni, 2 ),...) is independent of {£/*} >;L and of 

Fj-\,m, combined with the explicit representation of the Kantorovich-Rubinstein distance given in 
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(4.1). The induction hypothesis now gives 


di(Fj )7n ,Fj) < p\E di(Fj-i }m , Fj-i) 


+ K a m~ 1+1/a 


3 ~ 1 


<K a m- 1+1 / a Pl Y^p[ + K 0 


m 


-1+1/a 


i =0 


= K a m- 1+1 / a Y,p\. 


i =0 


This completes the proof. ■ 

Note that the proof of Theorem 
in L 1 -norm for all fixed j £ N, and 


4.3 


implies that R^ m) -+ C^Ff^U*) + Qi = R® 

rence in distribution. In other words, 


P ( < x ) -+ Fk(x) as m —> oo, (4.3) 


for all i = 1, 2,..., m, and for any continuity point of T) 0 . This also implies that 


Fk,m(x ) = P iR[ ' J < x ) —> Fk(x) asm-> oo, 


(4.4) 


for all continuity points of fy,. 

Since our algorithm produces a pool p( k ’ m l Q f m random variables approximately distributed ac¬ 
cording to Fk, it makes sense to use it for estimating expectations related to R^ k \ In particular, 


we are interested in estimators of the form in (3.2). The problem with this kind of estimators is 
that the random variables in are only conditionally independent given F ^—i m . 


P P 

Definition 2 We say that Q n is a consistent estimator for 6 if Q n -+ 6 as n -+ oo, where -+ 
denotes convergence in probability. 


Our second theorem shows the consistency of estimators of the form in (3.2) for a broad class of 
functions. 


Theorem 4.4 Suppose that the conditions of Proposition 2.1 are satisfied for some /3 > 1. Suppose 
h : M — > M is continuous and \h(x)\ < C( 1 + |x|) for all and some constant C > 0. Then, 

the estimator 

— = / h(x)dF kj m(x), 

m i= i 

where V= (R^’ m \ R^’ m \ ..., R^ m ^\, is a consistent estimator for E[h(R^)\. 


Proof. For any M > 0, define hM(x) as 

h M (x) = h(-M) l(x < —M) + h(x)l(-M < x < M ) + h(M)l(x > M ), 
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and note that h m is uniformly continuous. We then have 


h(x)dF k ,m{x) - / h(x)dF k {x ) 


< 2 C (1 + \x\)dF k (x) + 2C / (1 + | x\)dF kjTn (x) 

J\x\>M J\x\>M 


+ 


h M (x)dF k:Tn (x ) - / h M (x)dF k (x) 


(4.5) 


Fix e > 0 and choose M e > 0 such that E [(|i?( fc )| + l)l(|i?i fe i| > M e )] < e/(4C) and such that 
—M e and M e are continuity points of F*,. Define (R/ k,m \ R^) = (F^^U), F// 1 ([/)), where C7 is a 
uniform [0,1] random variable independent of Next, note that g(x) = 1 + |x| is Lipschitz 

continuous with Lipschitz constant one and therefore 


'\x\ >M e 


(1 + \x\)dFk,m(x ) — (1 + M e ) I Fk, m (—M e ) + 1 — F k:m (M ( 


+ / F k m (x') dx + / (1 F krn (x')')dx 

Jx<—M t Jx>M e 

< (1 + M e ) (F kj m(—M e ) + 1 — Fk,m(M e ) \ + di{F k ^m F k ) 


+ f F k (x)dx+ I (1 - F k (x))dx 

Jx<—M e Jx>M e 


— (1 + M e ) ( F k , m (—M e ) — F k (—M e ) + F k (M e ) — F kjTn (M e )J + di(F k ^ m , F k ) 


+ E 


(|F (fc) | + l)l(|F (fc) | > M e 


Finally, since is bounded and uniformly continuous, then oj{8) = sup{| hM t { x ) ~ hM e (y )I : 
\x — y\ < 5} converges to zero as 5 —> 0. Hence, for any 7 > 0, 


h M Ax)dF k ,m(x) - / h Me (x)dF k (x) 


< E 


h Me (R {k ’ m) ) ~ h Mc (R (k) ) 


k.m 


< w(m- 7 ) + K e E [l (| R^ - F (/c) I > m -A 

< ca(m -7 ) + K e m' y di(F krn , F k ), 


k.m 


where 2K e = sup{|/iM e (a;)| : x € M}. Choose 0 < 7 < 1 — 1/a for the a € (1, 2) in Theorem 4.3 and 
combine the previous estimates to obtain 

E / h(x)dF k}m (dx) — / h(x)dF k (dx 

_ J M J M 

< 2(7(1 + M e ) (E[F Km (-M e )} - F k (-M e ) + F fc (M e ) - F[F fc , m (M e 
+ e + w(m- 7 ) + (2 C + K e mF)E U (F k ,m, F k ) 


Since 


by (4.4), and wFE 


E[F kim {-M e )\ -7 F k (—M e ) and F[F fc , m (M e )] -7 F fc (M e ) 
^1 {F km , F/c) 


0 by Theorem 


4.3 


it follows that 


lim sup F 

m—> 0 o 


h(x)dF k ^ m {dx) — / h{x)dF k {dx) 


< e. 
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Figure 2: The functions Fi 0 ,iooo(a), Ao,200(a) and Ao,1000(a). 

Since e > 0 was arbitrary, the convergence in L ] . and therefore in probability, follows. ■ 


5 Numerical examples 

This last section of the paper gives a numerical example to illustrate the performance of our 
algorithm. Consider a generic branching vector ( Q , N, Ci, C 2 , ■ ■ ■) where the {Ci}i>\ are i.i.d. and 
independent of N and Q , with N also independent of Q. 

Figure [ 2 ] plots the empirical cumulative distribution function of 1000 samples of R ^ 10 , i.e., Iq 0,1000 
in our notation, versus the functions Ao,200 and Ao.iooo produced by our algorithm, for the case 
where the Ci are uniformly distributed in [0,0.2], Q uniformly distributed in [0,1] and N is a Poisson 
random variable with mean 3. Note that we cannot compare our results with the true distribution 
F10 since it is not available in closed form. Computing Fio.iooo required 883.3 seconds using Python 
with an Intel i7-4700MQ 2.40 GHz processor and 8 GB of memory, while computing Aoqooo required 
only 2.1 seconds. We point out that in applications to information ranking algorithms E[N] can be 
in the thirties range, which would make the difference in computation time even more impressive. 

Our second example plots the tail distribution of the empirical cumulative distribution function 
of A 10 ) for 10,000 samples versus the tail of Ao,10000 for an example where N is a zeta random 
varialbe with a probability mass function P(N = k ) oc k ~ 2 ' 5 , Q is an exponential random variable 
with mean 1, and the Ci have a uniform distribution in [0, 0.5]. In this case the exact asymptotics 
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Figure 3: The functions 1 — Fio,ioooo( x ')- 1 ~ Aopoooo (%) and G'io(x), where G\q is evaluated only 
at integer values of x and linearly interpolated in between. 


for P(R,( k > > x) as x —> 00 are given by 


P(R.W > x) ~ 


(EjC^EjQjr 

(1-Pi) a 


k 


X] Pa(l — Pi~ j ) a P{N > x), 
3=0 


where P(N > x) = x a L(x ) is regularly varying (see Lemma 5.1 in [7]), which reduces for the 
specific distributions we have chosen to 


G 10 (x) ± 


(0.25) 2 - 5 
(1 - (0.49)) 2 - 5 


10 

X^(0.07) j (l - (0.49) 10 - j ') 2 - 5 P(1V > x) 
3 =0 


(0.365) P{N > x). 


Figure [i] plots the complementary distributions of F 10 , 10000 , Ao, 10000 and compares them to G. We 
can see that the tails of both F 10,10000 and Ao, 10000 approach the asymptotic roughly at the same 
time. 
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